Improving knowledge distillation using unified ensembles of specialized teachers

نویسندگان

چکیده

The increasing complexity of deep learning models led to the development Knowledge Distillation (KD) approaches that enable us transfer knowledge between a very large network, called teacher and smaller faster one, student. However, as recent evidence suggests, using powerful teachers often negatively impacts effectiveness distillation process. In this paper, reasons behind apparent limitation are studied an approach transfers more efficiently is proposed. To end, multiple highly specialized employed, each one for small set skills, overcoming aforementioned limitation, while also achieving high efficiency by diversifying ensemble. At same time, employed ensemble formulated in unified structure, making it possible simultaneously train models. proposed method demonstrated three different image datasets, leading improved performance, even when compared with state-of-the-art ensemble-based methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Knowledge Distillation from an Ensemble of Teachers

This paper describes the effectiveness of knowledge distillation using teacher student training for building accurate and compact neural networks. We show that with knowledge distillation, information from multiple acoustic models like very deep VGG networks and Long Short-Term Memory (LSTM) models can be used to train standard convolutional neural network (CNN) acoustic models for a variety of...

متن کامل

Knowledge distillation using unlabeled mismatched images

Current approaches for Knowledge Distillation (KD) either directly use training data or sample from the training data distribution. In this paper, we demonstrate effectiveness of ’mismatched’ unlabeled stimulus to perform KD for image classification networks. For illustration, we consider scenarios where this is a complete absence of training data, or mismatched stimulus has to be used for augm...

متن کامل

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...

متن کامل

Improving the distillation energy network

E nergy costs are the largest percentage of a hydrocarbon plant’s operating expenditures. This is especially true of the distillation process, which requires substantial energy consumption. Concerns over recent high costs and economic pressures continually emphasise the need for efficient distillation design and operation without a loss of performance. This article illustrates how energy-effici...

متن کامل

Apprentice: Using Knowledge Distillation Techniques to Improve Low-precision Net-

Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems — the models (often deep networks or wide net...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition Letters

سال: 2021

ISSN: ['1872-7344', '0167-8655']

DOI: https://doi.org/10.1016/j.patrec.2021.03.014